Installation

  1. Install R
  2. Install RTools if you are on Windows
  3. Install RStudio

For more details, see Software and Package Versions.

Running This Code

  1. Ensure the installation steps above are completed
  2. Download a zip of the code and data here and unzip it
  3. In RStudio, open the src/src.Rproj file
  4. Then, open the src/index.Rmd file
  5. In RStudio:
    • Run all code: Click the Run drop down (top right of the code pane) and click Run All
    • Generate HTML version: Click knit (top left of code pane) and a file will be generated in docs/index.html

Libraries

Install R packages if needed:

  • tidyverse: general data processing package that works with tabular structured data
  • lubridate: convenient functions for handling date and time data
  • sf: spatially enables the tidyverse package to extend functionality for spatial data
  • ggplot2: package for creating plots, including maps of sf objects
  • tmap: useful for quick static and interactive maps of sf objects
  • ggspatial: makes plotting general map elements like base maps, north arrows and scale bars easier
  • prettymapr: required dependency for ggspatial to create base maps
install.packages("tidyverse")
install.packages("lubridate")
install.packages("sf")
install.packages("tmap")
install.packages("ggspatial")
install.packages("ggplot2")
install.packages("prettymapr")

Load libraries.

library(tidyverse)
library(lubridate)
library(sf)
library(ggspatial)
library(tmap)
library(ggplot2)

Tutorial

For this tutorial, we will be using the following data mainly:

Later in the tutorial, we will also use:

All data is available in the data folder.

Reading Spatial Data

To read spatial data use the read_sf function.

Note: For shapefiles, simply use extension .shp.

ksi <- read_sf("../data/toronto-ksi.geojson")
nbhoods <- read_sf("../data/toronto-nbhoods.geojson")

Preview Non-spatial Data

Non-spatial data works similarly to standard data.frame structures, except the addition of a geometry column usually at the end of the columns.

Let’s preview the non-spatial KSI data:

ksi

and also the Neighbourhoods data:

nbhoods

Most dplyr functions from tidyverse will work with sf objects, allowing you to easily manipulate the non-spatial portions of the read spatial data.

For example, we get the number of KSI collisions each year using the year function from lubridate and count from dplyr:

ksi %>%
  count(year(DATE))

Preview Spatial Data

You can preview spatial data using plot, which generates several maps of up to the first 9 columns as the default.

Let’s preview the KSI data:

plot(ksi)

We can also preview a specific variable:

plot(ksi["ACCLASS"])

and again for the Neighbourhoods data:

nbhoods

Preview Spatial Data Interactively

In addition to previewing spatial data statically, we can utilize the tmap package to preview spatial data interactively.

Let’s try it with the KSI data (we limit it to the first 1000 records due to the size of the data):

  • tmap_mode sets the visualization mode to interactive
  • tm_shape tells the tmap package to use the ksi data
  • head filters for the first 1000 rows
  • tm_dots visualizes the data as point data
tmap_mode("view") # set to interactive mode

tm_shape(ksi %>% head(1000)) +
  tm_dots(
    col = "ACCLASS", # color based on column
    clustering = TRUE, # group points together for cleaner visuals
    popup.vars = TRUE # allow user to click each point to inspect data
  )

We can also do the same for the Neighbourhoods data but with tm_polygons from tmap instead:

tmap_mode("view")

tm_shape(nbhoods) +
  tm_polygons(
        col = "#336699", # set polygon color
        border.col = "white", # set border color
        popup.vars = TRUE
    )

Coordinate Reference Systems (CRS)

Each sf object should ideally have a defined CRS that affects the accuracy of coordinates, units of measurement, and spatially related calculations.

For the KSI data, we can view the CRS with st_crs, which is usually WGS84 (World Geodetic System 1984, a global standard that is often used in most spatial data) with public registry reference EPSG:4326:

st_crs(ksi)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]

We also see the same CRS is used for the neighbourhoods data:

st_crs(nbhoods)
## Coordinate Reference System:
##   User input: WGS 84 
##   wkt:
## GEOGCRS["WGS 84",
##     DATUM["World Geodetic System 1984",
##         ELLIPSOID["WGS 84",6378137,298.257223563,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     ID["EPSG",4326]]

In most situations, you do not need to modify or define the CRS as the sf package handles accurate calculations and most CRS are distributed in WGS84, but we should always double check.

To define the CRS if it is not given in the spatial data, we use st_set_crs with the correct EPSG number (4326 in our case), which can be found on espg.io.

As an example, we will use a separate example such as the nc.shp file from the sf package, but set the CRS to NA as a demonstration:

# Read example data
exdata <- read_sf(
  system.file("shape/nc.shp", package="sf"),
  crs = NA
)

# View CRS
st_crs(exdata)
## Coordinate Reference System: NA

Now can set the CRS to the correct one, which is NAD27 with EPSG:4267:

# Set CRS to NAD27
exdata <- exdata %>%
  st_set_crs(4267)

# View CRS again after setting it to NAD27
st_crs(exdata)
## Coordinate Reference System:
##   User input: EPSG:4267 
##   wkt:
## GEOGCRS["NAD27",
##     DATUM["North American Datum 1927",
##         ELLIPSOID["Clarke 1866",6378206.4,294.978698213898,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["Geodesy."],
##         AREA["North and central America: Antigua and Barbuda - onshore. Bahamas - onshore plus offshore over internal continental shelf only. Belize - onshore. British Virgin Islands - onshore. Canada onshore - Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan and Yukon - plus offshore east coast. Cuba - onshore and offshore. El Salvador - onshore. Guatemala - onshore. Honduras - onshore. Panama - onshore. Puerto Rico - onshore. Mexico - onshore plus offshore east coast. Nicaragua - onshore. United States (USA) onshore and offshore - Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming - plus offshore . US Virgin Islands - onshore."],
##         BBOX[7.15,167.65,83.17,-47.74]],
##     ID["EPSG",4267]]

Now that there is a defined CRS, we can reproject them with st_transform to a different CRS, such as WGS84 (EPSG:4326), in case we do not have consistent CRS across different spatial data:

exdata %>%
  st_transform(4326)

It is important to ensure that all your spatial data is in the same CRS before mapping or conducting any spatially related calculations or processes.

Spatial Buffers

A common spatial data processing task is to apply buffers (radial extensions of spatial geometries) to spatial objects, which can be done using the st_buffer function.

Let’s try this for a couple of records in the KSI data, which are originally points as seen below:

# Select the first 3 records of KSI
ksi_buff <- ksi %>%
  head(3)

# Display the KSI geometry as points
ksi_buff %>%
  select(geometry) %>%
  plot

Then we can buffer the points in the KSI data by 1000 meters (unit of measure follows the CRS, which is meters for WGS84):

# Buffer the point geoms by 1000m
ksi_buff <- ksi_buff %>%
  select(geometry) %>%
  st_buffer(dist = 1000)

# Plot polygon geometries after buffer
ksi_buff %>%
  select(geometry) %>%
  plot

Applying buffers to points is useful for capturing other spatial objects within a certain distance of each point - relating captured spatial objects to each point.

Spatial Joins

Another commonly used spatial function is spatial joins, which can be done using the st_join function.

Let’s try getting the number of KSI collisions in 2022 within each Neighbourhood.

First, we filter out all KSI collisions for year 2022:

# Filter for 2022 data
ksi_2022 <- ksi %>%
  filter(year(DATE) == 2022)

# Preview the data
ksi_2022 %>% as_tibble

Then, we must join each KSI point to each neighbourhood polygon:

# Join KSI data to be within each neighbourhood polygon
nbhoods_join <- nbhoods %>%
  st_join(ksi_2022)

# Preview the joined data
nbhoods_join %>% as_tibble

Finally, we can apply the count function to get the number of collisions per neighbourhood in column n:

# Count the joined ksis per nbhood
nbhoods_ksi_2022 <- nbhoods_join %>%
  count(AREA_ID, AREA_NAME)

# Preview ksi per nbhood in 2022
nbhoods_ksi_2022 %>% as_tibble

We can also plot these spatially:

plot(nbhoods_ksi_2022["n"])

Making Maps for Publications

The plot function is convenient for previewing spatial data quickly, but makes it difficult to add standard map elements such as base maps, north arrows and scale bars.

For publications, we can use the ggspatial package to create publication ready maps.

As an example, we will utilize our joined 2022 KSI collision counts for each neighbourhood in Toronto, and create a publication ready map.

We will also load the Automated Speed Enforcement (ASE) data for Toronto to map an extra layer on top of the neighbourhood collision data.

ase <- read_sf("../data/toronto-ase.geojson")

Now that we have all the data we need, let’s create a publication ready map:

# Create pub ready map
ksi_map <- ggplot() +
  annotation_map_tile(
    zoomin = 1, # affects resolution of base map tiles
    type = "cartolight", # type of base map
    cachedir = "../data/cache" # cache the map tiles to avoid redownload
  ) +
  annotation_north_arrow(
    width = unit(0.2, "cm"),
    height = unit(0.5, "cm"),
    location = "tr" # where to place the arrow
  ) +
  annotation_scale(
    style = "ticks",
    location = "br" # where to place the scale bar
  ) +
  labs(
    title = "2022 Neighbourhood Collisions (Toronto, ON, CA)", # map title
    caption = "Data Source(s): Toronto Police Service Public Safety Portal, City of Toronto Open Data\nCRS: WGS84 / *Automated Speed Enforcement (ASE), Killed or Seriously Injured (KSI)" # map note
  ) +
  layer_spatial( # add spatial data layer for collisions
    nbhoods_ksi_2022, 
    aes(fill = n) # set the fill color to n, which is num of collisions
  ) +
  layer_spatial( # add spatial data layer for ASE
    ase,
    fill = NA,
    aes(color = "ASE* Camera") # set name of points for ASE
  ) +
  scale_color_manual(
    values = c("ASE* Camera" = "salmon") # change ase point color
  ) +
  guides(
    fill = guide_legend(
      order = 1,
      title = "KSI* Collisions" # legend title for collisions
    ),
    color = guide_legend(
      order = 2,
      title = element_blank() # remove legend title for ase
    )
  ) +
  theme(
    plot.caption = element_text(size = 6, color = "darkgray"), # alter note text
    legend.key = element_blank(), # remove grey legend background in ase points
    plot.title = element_text(hjust = 0.5) # center map title
  )

# Show pub ready map
ksi_map

Saving Maps

We can save the map we made in the previous section into a file with ggsave - ready to be included in a publication:

# Save to a pdf
ggsave(
  "../figures/fig-ksi-2022.pdf",
  plot = ksi_map,
  width = 8,
  height = 5,
  units = "in",
  dpi = 600
)

# Save to a png
ggsave(
  "../figures/fig-ksi-2022.png",
  plot = ksi_map,
  width = 8,
  height = 5,
  units = "in",
  dpi = 600
)

Writing Spatial Data

Now that we are done processing our data, we should save the results to a file with the write_sf function.

Let’s save our joined 2022 KSI collision counts for each neighbourhood in the data folder:

# Save a geojson file
nbhoods_ksi_2022 %>%
  write_sf("../data/toronto-nbhoods-ksi-2022.geojson")

# Save a shapefile
nbhoods_ksi_2022 %>%
  write_sf("../data/toronto-nbhoods-ksi-2022.shp")

Later on, we can read the saved shapefile back in:

# Read the shapefile back in
nbhoods_ksi_2022 <- read_sf("../data/toronto-nbhoods-ksi-2022.shp")

# Preview the data
nbhoods_ksi_2022 %>% as_tibble

Other Tutorials

That concludes the tutorial! For further information and other tutorials, see:

Other GIS Software

The visualization and mapping of spatial data is relatively complicated to get right, thus other alternatives to designing and creating maps more interactively and manually are:

  • QGIS: free and open source GIS software available for all operating systems
  • ArcGIS Pro: paid industry GIS software for Windows

Software and Package Versions

R and R package versions:

## R version 4.3.1 (2023-06-16)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.2.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Toronto
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] lubridate_1.9.3  forcats_1.0.0    stringr_1.5.0    dplyr_1.1.2     
##  [5] purrr_1.0.1      readr_2.1.4      tidyr_1.3.0      tibble_3.2.1    
##  [9] ggplot2_3.4.2    tidyverse_2.0.0  tmap_3.3-4       prettymapr_0.2.5
## [13] sf_1.0-16        ggspatial_1.1.9 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.0        viridisLite_0.4.2       farver_2.1.1           
##  [4] fastmap_1.1.1           leaflet_2.2.2           XML_3.99-0.16.1        
##  [7] digest_0.6.33           timechange_0.2.0        lifecycle_1.0.4        
## [10] ellipsis_0.3.2          terra_1.7-71            magrittr_2.0.3         
## [13] compiler_4.3.1          rlang_1.1.2             sass_0.4.7             
## [16] tools_4.3.1             utf8_1.2.3              yaml_2.3.7             
## [19] knitr_1.43              labeling_0.4.2          htmlwidgets_1.6.2      
## [22] sp_2.1-3                classInt_0.4-10         plyr_1.8.8             
## [25] RColorBrewer_1.1-3      abind_1.4-5             KernSmooth_2.23-21     
## [28] withr_2.5.0             leafsync_0.1.0          grid_4.3.1             
## [31] fansi_1.0.4             e1071_1.7-13            leafem_0.2.3           
## [34] colorspace_2.1-0        scales_1.2.1            dichromat_2.0-0.1      
## [37] cli_3.6.1               rmarkdown_2.23          ragg_1.2.5             
## [40] generics_0.1.3          rstudioapi_0.15.0       tzdb_0.4.0             
## [43] tmaptools_3.1-1         DBI_1.1.3               cachem_1.0.8           
## [46] proxy_0.4-27            stars_0.6-5             parallel_4.3.1         
## [49] rosm_0.3.0              s2_1.1.6                base64enc_0.1-3        
## [52] vctrs_0.6.4             jsonlite_1.8.7          hms_1.1.3              
## [55] systemfonts_1.0.4       crosstalk_1.2.1         jquerylib_0.1.4        
## [58] units_0.8-5             glue_1.6.2              lwgeom_0.2-14          
## [61] codetools_0.2-19        leaflet.providers_2.0.0 stringi_1.7.12         
## [64] gtable_0.3.3            raster_3.6-26           munsell_0.5.0          
## [67] pillar_1.9.0            htmltools_0.5.5         R6_2.5.1               
## [70] textshaping_0.3.6       wk_0.9.1                evaluate_0.21          
## [73] lattice_0.21-8          highr_0.10              png_0.1-8              
## [76] bslib_0.5.0             class_7.3-22            Rcpp_1.0.11            
## [79] xfun_0.39               pkgconfig_2.0.3

RStudio version:

## $citation
## To cite RStudio in publications use:
## 
##   Posit team (2023). RStudio: Integrated Development Environment for R.
##   Posit Software, PBC, Boston, MA. URL http://www.posit.co/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {RStudio: Integrated Development Environment for R},
##     author = {{Posit team}},
##     organization = {Posit Software, PBC},
##     address = {Boston, MA},
##     year = {2023},
##     url = {http://www.posit.co/},
##   }
## 
## $mode
## [1] "desktop"
## 
## $version
## [1] '2023.6.1.524'
## 
## $long_version
## [1] "2023.06.1+524"
## 
## $release_name
## [1] "Mountain Hydrangea"